An Effective and Robust Framework for Transliteration Exploration

نویسندگان

Ea-Ee Jan

Niyu Ge

Shih-Hsiang Lin

Berlin Chen

چکیده

Transliteration is the process of proper name translation based on pronunciation. It is an important process in many multilingual natural language tasks. A common and essential component of transliteration approaches is a verification mechanism that tests if the two names in different languages are translations of each other. Although many transliteration systems have verification as a component, verification as a stand-alone problem is relatively new. In this paper, we propose a simple, effective and robust training framework for the task of verification. We show the many applications of the verification techniques. Our proposed method can operate on both phonemic and orthographic inputs. Our best results show that a simple, straightforward orthographic representation is sufficient and no complex training method is needed. It is effective because it achieves remarkable accuracies. It is robust because it is language-independent. We show that on Chinese and Korean our technique achieves equal error rate well below 1% and around 1% for Japanese using 2009 and 2010 NEWS transliteration generation share task dataset. Our results also show that the orthographic system outperforms the phonemic system. This is especially encouraging because the orthographic inputs are easier to generate and secondly, one does not need to resort to more complex training algorithm to achieve excellent results. This approach is integrated for proper name based cross lingual information retrieval without translation. 1 # This work was partially sponsored by "Aim for the Top University Plan" of National Taiwan Normal University and Ministry of Education, Taiwan.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparison of Different Machine Transliteration Models

Machine transliteration is a method for automatically converting words in one language into phonetically equivalent ones in another language. Machine transliteration plays an important role in natural language applications such as information retrieval and machine translation, especially for handling proper nouns and technical terms. Four machine transliteration models – grapheme-based translit...

متن کامل

A Hybrid Model for Extracting Transliteration Equivalents from Parallel Corpora

Several models for transliteration pair acquisition have been proposed to overcome the out-of-vocabulary problem caused by transliterations. To date, however, there has been little literature regarding a framework that can accommodate several models at the same time. Moreover, there is little concern for validating acquired transliteration pairs using up-to-date corpora, such as web documents. ...

متن کامل

Learning Transliteration Lexicons from the Web

This paper presents an adaptive learning framework for Phonetic Similarity Modeling (PSM) that supports the automatic construction of transliteration lexicons. The learning algorithm starts with minimum prior knowledge about machine transliteration, and acquires knowledge iteratively from the Web. We study the active learning and the unsupervised learning strategies that minimize human supervis...

متن کامل

Robust Transliteration Mining from Comparable Corpora with Bilingual Topic Models

We present a high-precision, languageindependent transliteration framework applicable to bilingual lexicon extraction. Our approach is to employ a bilingual topic model to enhance the output of a state-of-the-art graphemebased transliteration baseline. We demonstrate that this method is able to extract a high-quality bilingual lexicon from a comparable corpus, and we extend the topic model to p...

متن کامل

CRFA-CRBM: a hybrid technique for anomaly recognition in regional geochemical exploration; case study: Dehsalm area, east of Iran

Identification of geochemical anomalies is a significant step during regional geochemical exploration. In this matter, new techniques have been developed based on deep learning networks. These simple-structure-networks act like our brains on processing the data by simulating deep layers of thinking. In this paper, a hybrid compositional-deep learning technique was applied to identify the anomal...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

An Effective and Robust Framework for Transliteration Exploration

نویسندگان

چکیده

منابع مشابه

A Comparison of Different Machine Transliteration Models

A Hybrid Model for Extracting Transliteration Equivalents from Parallel Corpora

Learning Transliteration Lexicons from the Web

Robust Transliteration Mining from Comparable Corpora with Bilingual Topic Models

CRFA-CRBM: a hybrid technique for anomaly recognition in regional geochemical exploration; case study: Dehsalm area, east of Iran

عنوان ژورنال:

اشتراک گذاری